Ratio bias

Published

May 11, 2021

When you compute a ratio of two random numbers, the ratio is biased in general, unless the relative error on the denominator is very small. This is a general consequence of the fact that any non-linear transformation of a random variable introduces a bias in general.

The good news is that we can correct for this bias to first order with a simple formula, and thus get an approximately unbiased result.

import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(1)

a_true = 100.
b_factor = 20.

a = rng.poisson(a_true, size=10000)
b = rng.poisson(a_true / b_factor, size=len(a)) * b_factor
b[b==0] = np.nan
plt.hist(a / b, bins=50, label="toys")
plt.axvline(1, ls="--", color="k", label="true")
plt.axvline(np.nanmean(a/b), ls="-", color="r", label="avg")
plt.legend();

@np.vectorize
def ratio_bias(a, a_var, b, b_var):
    if np.isnan(a) or np.isnan(b) or a == 0 or b == 0:
        return np.nan

    scale_a = a_var / a
    n_eff_a = a / scale_a
    scale_b = b_var / b
    n_eff_b = b / scale_b

    rng = np.random.default_rng(1)
    a_toy = rng.poisson(n_eff_a, size=10000) * scale_a
    b_toy = rng.poisson(n_eff_b, size=len(a_toy)) * scale_b
    b_toy[b_toy == 0] = np.nan
    r_toy = np.nanmean(a_toy / b_toy)
    return r_toy / (a / b)
a = 10.
b = 10.

for a_rel_err in (0.01, 0.5):
    b_rel_err = np.geomspace(0.01, 0.5, 10)
    bias = ratio_bias(a, (a * a_rel_err) ** 2, b, (b * b_rel_err) ** 2)
    plt.plot(b_rel_err, bias, label=f"$\\sigma(a)/a = {a_rel_err:.2f}$")

# analytical formula for bias
bias = (1+b_rel_err**2)
plt.plot(b_rel_err, bias, label=f"analytical")

plt.ylabel(r"$\langle a/b \rangle$ / truth")
plt.xlabel(r"$\sigma(b)/b$")
plt.legend();

The analytical formula is a first order correction, therefore it fails to describe the bias for large relative uncertainties.